To visualize & analyze card usage and car movement data with the employee disappearance incident
This study is based on the Mini-Challenge 2 of the VAST Challenge 2021. In a fiction scenario, there is a natural gas company named “GASTech” operating in the island country if Kronos. The GASTech didn’t do well in environment stewardship. And after an company IPO celebration in January 2014, several employees of GASTech went missing. An environment organization is suspected in the disappearance.
Many of the Abila, Kronos-based employees of GASTech have company cars which are approved for both personal and business use. Those who do not have company cars have the ability to check out company trucks for business use, but these trucks cannot be used for personal business.
Employees with company cars are happy to have these vehicles, because the company cars are generally much higher quality than the cars they would be able to afford otherwise. However, GASTech does not trust their employees. Without the employees? knowledge, GASTech has installed geospatial tracking software in the company vehicles. The vehicles are tracked periodically as long as they are moving.
This vehicle tracking data has been made available to law enforcement to support their investigation. Unfortunately, data is not available for the day the GASTech employees went missing. Data is only available for the two weeks prior to the disappearance.
To promote local businesses, Kronos based companies provide a Kronos Kares benefit card to GASTech employees giving them discounts and rewards in exchange for collecting information about their credit card purchases and preferences as recorded on loyalty cards. This data has been made available to investigators in the hopes that it can help resolve the situation. However, Kronos Kares does not collect personal information beyond purchases.
Use visual analytics to identify which GASTech employees made which purchases and identify suspicious patterns of behavior. Besides, the study must cope with uncertainties that result from missing, conflicting, and imperfect data to make recommendations for further investigation.
Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies? Please limit your answer to 8 images and 300 words.
Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find? Please limit your answer to 8 images and 500 words.
Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data? Please limit your answer to 8 images and 500 words.
Given the data sources provided, identify potential informal or unofficial relationships among GASTech personnel. Provide evidence for these relationships. Please limit your response to 8 images and 500 words.
Do you see evidence of suspicious activity? Identify 1- 10 locations where you believe the suspicious activity is occurring, and why Please limit your response to 10 images and 500 words.
The VAST Challenge 2014 has the same scenario with slightly different dataset and questions. The submission repository can be found here.
Various analytic tools were used among the submissions, like SAS EM, D3 and custom tools. From the graph view, the heatmap and time histograms were useful to represent the numerical value under the combination of two discrete/categorical variables, such as the hourly temperature of different regions in one day. Besides, car movement line/dot graph on the map can help to track suspicious activities. And the relationships among individuals are shown represented well in network graphs.
However, almost all graphs were static and readers would find it difficult to explore other parts in graphs which were not specially mentioned by authors. Since the study is displayed on html pages, interactive graphs will be possible and add more details & convenience. For example, the tooltip function can make every data point to have detailed information without checking the axis or drawing additional graphs. The zoom-in and on-click functions allow readers to check or focus on one part in a complex graph with many lines/objects.
Import packages.
We have the consumption records of credit cards and loyalty cards.
loyalty <- read_csv("data/loyalty_data.csv", locale=locale(encoding ="windows-1252"))
cc <- read_csv("data/cc_data.csv", locale=locale(encoding ="windows-1252"))
# The location names contain some special characters, such as "Café", which are not recognized by utf-8 encoding. Thus, special encoding is used in reading data.
Take a glimpse of credit card data and loyalty card data
knitr::kable(cc[c(0:5),],
caption = "Credit Card Usage Data") %>%
kableExtra::kable_paper("hover", full_width = F)
| timestamp | location | price | last4ccnum |
|---|---|---|---|
| 01/06/2014 07:28 | Brew’ve Been Served | 11.34 | 4795 |
| 01/06/2014 07:34 | Hallowed Grounds | 52.22 | 7108 |
| 01/06/2014 07:35 | Brew’ve Been Served | 8.33 | 6816 |
| 01/06/2014 07:36 | Hallowed Grounds | 16.72 | 9617 |
| 01/06/2014 07:37 | Brew’ve Been Served | 4.24 | 7384 |
knitr::kable(loyalty[c(0:5),],
caption = "Loyalty Card Usage Data") %>%
kableExtra::kable_paper("hover", full_width = F)
| timestamp | location | price | loyaltynum |
|---|---|---|---|
| 01/06/2014 | Brew’ve Been Served | 4.17 | L2247 |
| 01/06/2014 | Brew’ve Been Served | 9.60 | L9406 |
| 01/06/2014 | Hallowed Grounds | 16.53 | L8328 |
| 01/06/2014 | Coffee Shack | 11.51 | L6417 |
| 01/06/2014 | Hallowed Grounds | 12.93 | L1107 |
The timestamp in the credit card usage date (“cc”) contains date and time, while the timestamp in the loyal card usage data (“loyalty”) contains only data. Besides, their data type is string, which will be transformed into datetime type.
After that, we will separate day, hour from the datetime feature. It will make it simple to draw graphs later.
The car movement GPS data and the car assignment data are provided.
gps <- read_csv("data/gps.csv")
knitr::kable(gps[c(0:5),],
caption = "GPS Data") %>%
kableExtra::kable_paper("hover", full_width = F)
| Timestamp | id | lat | long |
|---|---|---|---|
| 01/06/2014 06:28:01 | 35 | 36.07623 | 24.87469 |
| 01/06/2014 06:28:01 | 35 | 36.07622 | 24.87460 |
| 01/06/2014 06:28:03 | 35 | 36.07621 | 24.87444 |
| 01/06/2014 06:28:05 | 35 | 36.07622 | 24.87425 |
| 01/06/2014 06:28:06 | 35 | 36.07621 | 24.87417 |
The timestamp in the GPS data also need to be transformed.
Besides, the longitude and latitude will be rounded into 5 digits. It can avoid the inconsistent/inaccurate in GPS data to some extent. And five decimal places implies 1.11 meters accuracy, which is better than 4 or 6 digits (11.1 meter or 0.11 meter accuracy) under this question scenario.
In following sections, we will draw car stop locations as well as car movement paths on the map. Thus, we need to get the car stop locations and car movement from the GPS data.
And in the question background of the challenge page, it mentioned that the vehicles are tracked periodically as long as they are moving.
So the time gap in the time sequential GPS data within one car could indicate that this car stopped at current GPS location. To find places of interest, we will exclude out the time gap less than 3 minutes, which might be that the car stopped to wait for traffic light or just GPS tracker problem.
gps2 <- gps
gps2 <- gps2 %>%
group_by(id) %>%
# use the current timestamp minus the timestamp in a preceding record
mutate(end = Timestamp,
start = lag(Timestamp, default = first(Timestamp),
order_by = Timestamp),
diff_mins = difftime(end, start, units = "mins")) %>%
# diff_mins shows the time gap between the current record and the preceding record
# if the diff_mins > 3 mins, we think the car has stopped over 3 mins and just start running at the current timestamp
mutate(stop = ifelse(diff_mins >= 3, TRUE, FALSE)) %>%
filter(stop == TRUE) %>%
ungroup() %>%
arrange(id, Timestamp)
# rearrange useful features
gps2_stop <- gps2[c(7,6,2,3,4,8,5)]
gps2_stop_sf <- st_as_sf(gps2_stop,
coords = c("long", "lat"), # combine the lo, la
crs = 4326) # 4326 is wgs84 Geographic Coordinate System
The “start” in the “gps2_stop_sf” refers to the start time of this car stop period, while the “end” refers to the end time when the car ends parking and start moving.
However we will find the GPS data has multiple strange car stops. One is shown as follows.
knitr::kable(gps2_stop_sf[c(5:8),],
caption = "Anomalies in Car Stop") %>%
kableExtra::kable_paper("hover", full_width = F)
| start | end | id | diff_mins | day | geometry |
|---|---|---|---|---|---|
| 2014-01-06 17:48:03 | 2014-01-06 19:36:01 | 1 | 107.966667 mins | 6 | POINT (24.88259 36.0665) |
| 2014-01-06 19:42:27 | 2014-01-06 19:49:01 | 1 | 6.566667 mins | 6 | POINT (24.85632 36.07528) |
| 2014-01-06 19:49:01 | 2014-01-06 20:27:01 | 1 | 38.000000 mins | 6 | POINT (24.85627 36.07529) |
| 2014-01-06 20:33:30 | 2014-01-06 22:11:01 | 1 | 97.516667 mins | 6 | POINT (24.8826 36.06649) |
We can see the end time of the 2nd car stop is the same with the start time of the 3rd car stop. But they should be one car stop originally instead of two car stops. There are a lot of such car stop records. And why this occurred is that there is a single GPS record within an original car stop. This record should not be there and will break the car stop into two after the above code logic.
To fix this, we add one more flag feature to exclude every abnormal GPS records within single original car stop. These records have special timestamp: far away from one preceding timestamp and far away from one following timestamp in records of one car. But for the normal car stop, the timestamp, which is also the end time of this car stop period, refer the time when the car starts moving and should be followed by continuous timestamp. Thus, we can exclude those car stops whose following timestamp is far from itself.
We choose 30 seconds as the cutoff. It means that records will be regarded as abnormal and excluded if the car stop period ends at this timestamp (the car should start moving) but there are no car movement GPS data in 30 seconds.
gps2 <- gps
# assign unique index
gps2$idx <- c(1:685169)
gps2_false_stop <- gps2 %>%
group_by(id) %>%
arrange(Timestamp) %>%
mutate(end = Timestamp,
start = lag(Timestamp, default = first(Timestamp)),
diff_mins = difftime(end, start, units = "mins"),
move_time_sec = difftime(lead(Timestamp, n=2), Timestamp, units = "sec")) %>%
filter(diff_mins >= 3 & move_time_sec > 30) %>%
ungroup() %>%
dplyr::select(start, end, idx, id, lat, long)
false_stop_idx <- gps2_false_stop$idx
# exclude these records
gps2_true <- gps2 %>%
filter(!idx %in% false_stop_idx)
# now we can get true stop
gps2_true_stop <- gps2_true %>%
group_by(id) %>%
arrange(Timestamp) %>%
mutate(end = Timestamp,
start = lag(Timestamp, default = first(Timestamp)),
diff_mins = difftime(end, start, units = "mins"),
move_time_sec = difftime(lead(Timestamp, n=2), Timestamp, units = "sec")) %>%
filter(diff_mins >= 3 & move_time_sec < 30) %>%
ungroup() %>%
dplyr::select(start, end, diff_mins, diff_mins, id, lat, long, day, idx)
# concatenate GPS into coords
gps2_stop_sf <- st_as_sf(gps2_true_stop,
coords = c("long", "lat"), # combine the lo, la
crs = 4326) # 4326 is wgs84 Geographic Coordinate
# add unique index
gps2_stop_sf$idx1 <- c(1:nrow(gps2_stop_sf))
For car assignment, most vehicles are assigned one-to-one. Only truck drivers are not assigned cars but are allowed to use available trucks for business purpose.
car_assignments <- read_csv("data/car-assignments.csv")
# check car assignment data
knitr::kable(car_assignments,
caption = "Car Assignment") %>%
kableExtra::kable_paper("hover", full_width = F) %>%
kableExtra::scroll_box(height = "300px")
| LastName | FirstName | CarID | CurrentEmploymentType | CurrentEmploymentTitle |
|---|---|---|---|---|
| Calixto | Nils | 1 | Information Technology | IT Helpdesk |
| Azada | Lars | 2 | Engineering | Engineer |
| Balas | Felix | 3 | Engineering | Engineer |
| Barranco | Ingrid | 4 | Executive | SVP/CFO |
| Baza | Isak | 5 | Information Technology | IT Technician |
| Bergen | Linnea | 6 | Information Technology | IT Group Manager |
| Orilla | Elsa | 7 | Engineering | Drill Technician |
| Alcazar | Lucas | 8 | Information Technology | IT Technician |
| Cazar | Gustav | 9 | Engineering | Drill Technician |
| Campo-Corrente | Ada | 10 | Executive | SVP/CIO |
| Calzas | Axel | 11 | Engineering | Hydraulic Technician |
| Cocinaro | Hideki | 12 | Security | Site Control |
| Ferro | Inga | 13 | Security | Site Control |
| Dedos | Lidelse | 14 | Engineering | Engineering Group Manager |
| Bodrogi | Loreto | 15 | Security | Site Control |
| Vann | Isia | 16 | Security | Perimeter Control |
| Flecha | Sven | 17 | Information Technology | IT Technician |
| Frente | Birgitta | 18 | Engineering | Geologist |
| Frente | Vira | 19 | Engineering | Hydraulic Technician |
| Fusil | Stenig | 20 | Security | Building Control |
| Osvaldo | Hennie | 21 | Security | Perimeter Control |
| Nubarron | Adra | 22 | Security | Badging Office |
| Lagos | Varja | 23 | Security | Badging Office |
| Mies | Minke | 24 | Security | Perimeter Control |
| Herrero | Kanon | 25 | Engineering | Geologist |
| Onda | Marin | 26 | Engineering | Drill Site Manager |
| Orilla | Kare | 27 | Engineering | Drill Technician |
| Borrasca | Isande | 28 | Engineering | Drill Technician |
| Ovan | Bertrand | 29 | Facilities | Facilities Group Manager |
| Resumir | Felix | 30 | Security | Security Group Manager |
| Sanjorge Jr. | Sten | 31 | Executive | President/CEO |
| Strum | Orhan | 32 | Executive | SVP/COO |
| Tempestad | Brand | 33 | Engineering | Drill Technician |
| Vann | Edvard | 34 | Security | Perimeter Control |
| Vasco-Pais | Willem | 35 | Executive | Environmental Safety Advisor |
| Hafon | Albina | NA | Facilities | Truck Driver |
| Hawelon | Benito | NA | Facilities | Truck Driver |
| Hawelon | Claudio | NA | Facilities | Truck Driver |
| Mies | Henk | NA | Facilities | Truck Driver |
| Morlun | Valeria | NA | Facilities | Truck Driver |
| Morlun | Adan | NA | Facilities | Truck Driver |
| Morluniau | Cecilia | NA | Facilities | Truck Driver |
| Nant | Irene | NA | Facilities | Truck Driver |
| Scozzese | Dylan | NA | Facilities | Truck Driver |
Let’s join the car assignment into the car stop locations, which will add more information while viewing the car line/dot graph on the map
For the car movement path, we need to group the GPS data and concatenate them to the simple feature format for the graph. One path is actually one line string with multiple GPS points.
# convert values from numerical to factor data type
gps3 <- gps
gps3$day <- as.factor(gps3$day)
gps3$id <- as_factor(gps3$id)
gps_sf <- st_as_sf(gps3,
coords = c("long", "lat"),
crs = 4326)
# group car paths
gps_path <- gps_sf %>%
group_by(id, day) %>%
summarize(m =mean(Timestamp),
do_union=FALSE) %>%
st_cast("LINESTRING")
The tourist map provided is not georeferenced. And QGIS can help to georeference an image with the ESRI shapefiles (geospatial vector data) of the city.
The process includes:
After the process, we will get a tif file which is a combination of tourist map and georeferenced road map. Then we can plot car movements line with longitude and latitude data on the map.
we need to import the tif file generated by QGIS and display the map.
bgmap <- raster("data/Geospatial/MC2-tourist.tif")
Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies?
To identify popularity, we can calculate the card usage frequency and amount in every locations of different days and hours.
Firstly, let’s plot the frequency of cards in the 14 days. We need to calculate the card usage frequency in different days, convert into data frame, draw their heatmaps and plot together.
# calculate the frequency data frame of credit and loyalty card usage
cc_freq_day <- as.data.frame(xtabs(~location+day, data = cc))
loyalty_freq_day <- as.data.frame(xtabs(~location+day, data = loyalty))
# join the two frequency data frame
freq_day_join <- full_join(cc_freq_day,loyalty_freq_day,by= c("location","day"))
names(freq_day_join) <- c("location","day","CC_Freq","Loyalty_Freq")
# transfer from factors to numeric with original values
freq_day_join$day <- as.numeric(levels(freq_day_join$day))[freq_day_join$day]
# plot the heatmap of credit card usage frequency
p1 <- ggplot(freq_day_join,aes(x=day,y=location))+
geom_tile(aes(fill=CC_Freq))+
scale_fill_gradient(low = "#deeff7", high = "#0D2330")+
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.title=element_blank())
# plot the heatmap of loyalty card usage frequency
p2 <- ggplot(freq_day_join,aes(x=day,y=location))+
geom_tile(aes(fill=Loyalty_Freq))+
scale_fill_gradient(low = "#deeff7", high = "#0D2330")+
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.title=element_blank())
# convert static graph into interactive
plotly::subplot(ggplotly(p1),
ggplotly(p2),
shareY = TRUE)
Figure 1: Daily Frequency of Credit (left) and Loyalty (right) Card Usage
From the card usage frequency (or consumption frequency), we can easily identify that “Katerina’s Café”, “Hippokampos” and “Brew’ve Been Served” are the most popular with almost all squares in deeper color, where the daily consumption frequency is above 10. “Hallowed Grounds” and “Guy’s Gyros” are slightly less popular.
Besides, we can find that “Brew’ve Been Served” and “Hallowed Grounds” are popular every day except weekends (day 11-12, 18-19). The frequency are 0 on weekends, which might because the location is closed on weekends. It’s the same to “Hallowed Grounds”.
On weekends, “Katerina’s Café” and “Hippokampos” are the most popular while other locations might be closed or less consumption these days.
As for anomalies, we can see there is one white line in the graph for loyalty card, corresponding to “Daily Dealz”. This location only have one credit card consumption record on day 13 and no loyalty card record among the two weeks.
The daily frequencies are the same between “Maximum Iron and Steel” and “Kronos Pipe and Irrigation” every day in the two weeks.
To correct these anomalies, we can check the GPS data to make sure who made the only one consumption in “Daily Dealz”. If there were no anomalies after checking, we can just delete this single record in the credit card data. And for the situation between “Maximum Iron and Steel” and “Kronos Pipe and Irrigation”, it’s just coincidence after checking the consumption amount.
Secondly, we can plot the consumption amount instead of frequency. The steps are almost the same.
cc_price_matrix <- tapply(cc$price,cc[,c("location","day")],sum)
cc_price <- reshape2::melt(cc_price_matrix)
cc_price <- na.omit(cc_price)
p1_price <- ggplot(cc_price,aes(x=day,y=location))+
geom_tile(aes(fill=value))+
scale_fill_gradient(low = "#deeff7", high = "#0D2330")+
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.title=element_blank())
ggplotly(p1_price)
Figure 2: Daily Consumption Amount of Credit Card
Figure 3: Daily Consumption Amount of Credit Card (box plot)
The consumption amount differences among locations are much bigger than frequency differences.
Apparently, “Abila Airport” are the place where has the biggest consumption amount. And these consumption occurred on weekdays only.
Besides, “Stewart and Sons Fabrication”, “Nationwide Refinery” and “Abila Airport” also have high consumption amounts. All these locations don’t show high frequency values in previous graphs but have very high daily consumption amounts.
And there are many outliers which might be anomalies. For example, “Frydos Autosupply n’ More” had a daily cc consumption amount ($10455.22) on day 13, which is several times as much as those in other days. And the “Albert’s Fine Clothing” also has a daily consumption outlier on day 17.
What’s more, there are many inconsistencies between amounts in the credit card record and loyalty card record. At “Stewart and Sons Fabrication”, the daily amounts from day 13 to day 16 don’t match in two graphs.
To correct these anomalies, we need to check through the car movement data where the consumption amount outliers exist. It’s to see whether there are activities or other gathering to cause the high consumption. As for the inconsistency in amounts, the possible explanations are there might be someone used only one of the two cards or got cashback in the consumption.
Lastly, we change the time unit from days to hours to analyze the popular locations. Only the timestamp of credit card data contains time, so there are no hourly heatmaps for loyalty card usage.
cc_freq_hour <- as.data.frame(xtabs(~location+hour, data = cc))
# convert factor into number
cc_freq_hour$hour <- as.numeric(levels(cc_freq_hour$hour))[cc_freq_hour$hour]
cc_price_hour_matrix <- tapply(cc$price,cc[,c("location","hour")],sum)
cc_price_hour <- reshape2::melt(cc_price_hour_matrix)
cc_hour_join <- full_join(cc_freq_hour, cc_price_hour, by= c("location","hour"))
names(cc_hour_join) <- c("location","hour","Freq","Amount")
p3_freq <- ggplot(cc_hour_join,aes(x=hour,y=location))+
geom_tile(aes(fill=Freq))+
scale_fill_gradient(low = "#deeff7", high = "#0D2330")+
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank())
p3_price <- ggplot(cc_hour_join,aes(x=hour,y=location))+
geom_tile(aes(fill=Amount))+
scale_fill_gradient(low = "#deeff7", high = "#0D2330")+
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank())
plotly::subplot(ggplotly(p3_freq),
ggplotly(p3_price),
shareY = TRUE) %>%
hide_colorbar()
Figure 4: Hourly Consumption Frequency and Amount of Credit Card
From the left hourly heatmap, we can easily identify the popular period for each locations since there are clear pattern.
And some anomalies exist in the strange time period. At 3am, there are 5 credit card usages in “Kronos Mart”. For “Daily Dealz”, the only credit card transaction happened at 6am.
The right heatmap also shows anomalies: the consumption at “Bean There Done That”, “Brewed Awakenings”, “Jack’s Magical Beans” and “Coffee Shack” all happened within 12 o’clock.
knitr::kable(cc %>%
filter(location %in% c("Bean There Done That","Coffee Shack",
"Jack's Magical Beans","Brewed Awakenings")),
caption = "Consumption Recorda at Bean There Done That and Coffee Shack ") %>%
kableExtra::kable_paper("hover", full_width = F) %>%
kableExtra::scroll_box(height = "300px")
| timestamp | location | price | last4ccnum | date | day | hour |
|---|---|---|---|---|---|---|
| 2014-01-06 12:00:00 | Coffee Shack | 51.51 | 7117 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Bean There Done That | 10.28 | 1415 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Bean There Done That | 16.09 | 9635 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Brewed Awakenings | 5.72 | 7688 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Jack’s Magical Beans | 11.03 | 6899 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Brewed Awakenings | 16.89 | 2540 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Bean There Done That | 8.54 | 1877 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Brewed Awakenings | 4.35 | 8332 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Bean There Done That | 5.24 | 1321 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Bean There Done That | 4.27 | 6895 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Jack’s Magical Beans | 18.59 | 9241 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Jack’s Magical Beans | 27.84 | 8156 | 2014-01-06 | 6 | 12 |
| 2014-01-06 12:00:00 | Jack’s Magical Beans | 18.10 | 2463 | 2014-01-06 | 6 | 12 |
| 2014-01-07 12:00:00 | Coffee Shack | 16.63 | 7117 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Bean There Done That | 51.25 | 1415 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Brewed Awakenings | 12.17 | 7688 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Jack’s Magical Beans | 23.68 | 6899 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Brewed Awakenings | 64.84 | 3853 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Brewed Awakenings | 71.59 | 2540 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Bean There Done That | 53.89 | 1877 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Brewed Awakenings | 6.72 | 8332 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Bean There Done That | 8.03 | 1321 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Bean There Done That | 46.25 | 6895 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Jack’s Magical Beans | 18.77 | 9241 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Jack’s Magical Beans | 19.61 | 8156 | 2014-01-07 | 7 | 12 |
| 2014-01-07 12:00:00 | Jack’s Magical Beans | 69.84 | 2463 | 2014-01-07 | 7 | 12 |
| 2014-01-08 12:00:00 | Coffee Shack | 13.13 | 7117 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Bean There Done That | 17.45 | 1415 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Bean There Done That | 12.07 | 9635 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Brewed Awakenings | 11.73 | 7688 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Jack’s Magical Beans | 3.96 | 6899 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Brewed Awakenings | 12.32 | 2540 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Bean There Done That | 8.06 | 1877 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Brewed Awakenings | 15.03 | 8332 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Bean There Done That | 94.96 | 1321 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Jack’s Magical Beans | 18.78 | 9241 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Jack’s Magical Beans | 12.37 | 8156 | 2014-01-08 | 8 | 12 |
| 2014-01-08 12:00:00 | Jack’s Magical Beans | 31.25 | 2463 | 2014-01-08 | 8 | 12 |
| 2014-01-09 12:00:00 | Coffee Shack | 5.01 | 7117 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Bean There Done That | 13.00 | 1415 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Brewed Awakenings | 11.76 | 7688 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Jack’s Magical Beans | 10.11 | 6899 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Jack’s Magical Beans | 6.00 | 3853 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Brewed Awakenings | 13.16 | 2540 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Bean There Done That | 4.06 | 1877 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Brewed Awakenings | 17.66 | 8332 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Bean There Done That | 15.53 | 1321 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Jack’s Magical Beans | 12.13 | 9241 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Jack’s Magical Beans | 15.55 | 8156 | 2014-01-09 | 9 | 12 |
| 2014-01-09 12:00:00 | Jack’s Magical Beans | 37.43 | 2463 | 2014-01-09 | 9 | 12 |
| 2014-01-10 12:00:00 | Bean There Done That | 15.39 | 1415 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Bean There Done That | 3.92 | 9635 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Brewed Awakenings | 13.82 | 7688 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Jack’s Magical Beans | 15.69 | 6899 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Brewed Awakenings | 6.17 | 2540 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Bean There Done That | 13.41 | 1877 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Brewed Awakenings | 13.90 | 8332 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Bean There Done That | 18.67 | 1321 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Bean There Done That | 9.30 | 6895 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Jack’s Magical Beans | 16.29 | 9241 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Jack’s Magical Beans | 5.46 | 8156 | 2014-01-10 | 10 | 12 |
| 2014-01-10 12:00:00 | Jack’s Magical Beans | 8.22 | 2463 | 2014-01-10 | 10 | 12 |
| 2014-01-13 12:00:00 | Coffee Shack | 4.09 | 7117 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Bean There Done That | 7.93 | 1415 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Bean There Done That | 98.02 | 9635 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Brewed Awakenings | 12.69 | 7688 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Jack’s Magical Beans | 18.31 | 6899 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Brewed Awakenings | 56.40 | 2540 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Bean There Done That | 19.56 | 1877 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Brewed Awakenings | 16.25 | 8332 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Bean There Done That | 5.54 | 1321 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Bean There Done That | 10.37 | 1874 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Bean There Done That | 15.24 | 9617 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Bean There Done That | 18.59 | 6895 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Jack’s Magical Beans | 5.22 | 9241 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Jack’s Magical Beans | 9.64 | 8156 | 2014-01-13 | 13 | 12 |
| 2014-01-13 12:00:00 | Jack’s Magical Beans | 17.52 | 2463 | 2014-01-13 | 13 | 12 |
| 2014-01-14 12:00:00 | Coffee Shack | 3.63 | 7117 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Bean There Done That | 7.28 | 1415 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Bean There Done That | 19.37 | 9635 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Brewed Awakenings | 17.14 | 7688 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Brewed Awakenings | 8.30 | 2540 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Bean There Done That | 14.56 | 1877 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Brewed Awakenings | 11.35 | 8332 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Bean There Done That | 67.82 | 1321 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Bean There Done That | 5.90 | 6895 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Jack’s Magical Beans | 63.36 | 9241 | 2014-01-14 | 14 | 12 |
| 2014-01-14 12:00:00 | Jack’s Magical Beans | 95.70 | 8156 | 2014-01-14 | 14 | 12 |
| 2014-01-15 12:00:00 | Bean There Done That | 5.34 | 9635 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Jack’s Magical Beans | 11.15 | 6899 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Brewed Awakenings | 91.62 | 2540 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Bean There Done That | 93.13 | 1877 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Brewed Awakenings | 3.62 | 8332 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Bean There Done That | 18.96 | 1321 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Bean There Done That | 15.24 | 6895 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Jack’s Magical Beans | 5.30 | 9241 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Jack’s Magical Beans | 84.06 | 8156 | 2014-01-15 | 15 | 12 |
| 2014-01-15 12:00:00 | Jack’s Magical Beans | 19.54 | 2463 | 2014-01-15 | 15 | 12 |
| 2014-01-16 12:00:00 | Coffee Shack | 10.48 | 7117 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Bean There Done That | 19.82 | 1415 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Bean There Done That | 12.33 | 9635 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Brewed Awakenings | 17.91 | 7688 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Jack’s Magical Beans | 9.12 | 6899 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Brewed Awakenings | 6.53 | 2540 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Bean There Done That | 5.56 | 1877 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Brewed Awakenings | 6.39 | 8332 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Bean There Done That | 98.34 | 1321 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Bean There Done That | 17.40 | 6895 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Jack’s Magical Beans | 14.86 | 9241 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Jack’s Magical Beans | 12.56 | 8156 | 2014-01-16 | 16 | 12 |
| 2014-01-16 12:00:00 | Jack’s Magical Beans | 6.95 | 2463 | 2014-01-16 | 16 | 12 |
| 2014-01-17 12:00:00 | Coffee Shack | 19.16 | 7117 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Bean There Done That | 19.67 | 1415 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Bean There Done That | 8.21 | 9635 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Brewed Awakenings | 8.16 | 7688 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Jack’s Magical Beans | 7.57 | 6899 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Brewed Awakenings | 88.69 | 2540 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Bean There Done That | 19.56 | 1877 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Brewed Awakenings | 6.00 | 8332 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Bean There Done That | 18.72 | 1321 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Bean There Done That | 5.77 | 6895 | 2014-01-17 | 17 | 12 |
| 2014-01-17 12:00:00 | Jack’s Magical Beans | 19.11 | 9241 | 2014-01-17 | 17 | 12 |
We can see that all consumption records at these locations are exactly at 12:00:00. And these locations seems to be coffee shop. There might be something wrong with the machine which records credit card consumption at the two location.
Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find?
Firstly, filter out the credit card consumption record at “Frydos Autosupply n’ More” on day 13.
knitr::kable(cc %>%
filter(day == 13 & location == "Frydos Autosupply n' More"),
caption = "Consumption Records at Frydos Autosupply n' More on day 13") %>%
kableExtra::kable_paper("hover", full_width = F)
| timestamp | location | price | last4ccnum | date | day | hour |
|---|---|---|---|---|---|---|
| 2014-01-13 19:20:00 | Frydos Autosupply n’ More | 10000.00 | 9551 | 2014-01-13 | 13 | 19 |
| 2014-01-13 19:41:00 | Frydos Autosupply n’ More | 188.57 | 8129 | 2014-01-13 | 13 | 19 |
| 2014-01-13 19:59:00 | Frydos Autosupply n’ More | 64.60 | 8411 | 2014-01-13 | 13 | 19 |
| 2014-01-13 21:11:00 | Frydos Autosupply n’ More | 202.05 | 2418 | 2014-01-13 | 13 | 21 |
The abnormal consumption is from the cc number 9551. Let’s check the consumption records of this cc on day 13.
knitr::kable(cc %>%
filter(day == 13 & last4ccnum == 9551),
caption = "Consumption Records of cc 9551 Owner on Day 13") %>%
kableExtra::kable_paper("hover", full_width = F)
| timestamp | location | price | last4ccnum | date | day | hour |
|---|---|---|---|---|---|---|
| 2014-01-13 06:04:00 | Daily Dealz | 2.01 | 9551 | 2014-01-13 | 13 | 6 |
| 2014-01-13 13:18:00 | U-Pump | 55.25 | 9551 | 2014-01-13 | 13 | 13 |
| 2014-01-13 13:28:00 | Hippokampos | 30.51 | 9551 | 2014-01-13 | 13 | 13 |
| 2014-01-13 19:20:00 | Frydos Autosupply n’ More | 10000.00 | 9551 | 2014-01-13 | 13 | 19 |
| 2014-01-13 19:30:00 | Ouzeri Elian | 28.75 | 9551 | 2014-01-13 | 13 | 19 |
We can see the cc owner make the only only transaction at “Daily Dealz” at early morning (6 o’clock), which is the only one transaction in the two weeks.
Besides, “U-Pump” is a special place because there were only two consumption records in the two weeks, which can be found in Figure 4 and Figure 1. Therefore, there should have fewer stop locations near U-Pump in the car GPS data.
We can check the stop locations on day 13 (all dots in Figure 5). On this day, there was one point near U-Pump where the stop time is near the consumption time in “U-Pump”. The corresponding car id is 24.
Thus, we think that Minke, the owner of car 24, might use the credit card 9551. Let’s draw the moving path of this car to discover more. The stops dots of car 24 are highlight into blue and the its drive path is also plotted with a blue line.
gps2_stop_day13 <- gps2_stop_sf %>%
filter(day ==13)
gps2_stop_car24_day13 <- gps2_stop_sf %>%
filter(day ==13 & id == 24)
gps_path_car24_day13 <- gps_path %>%
filter(day == 13 & id == 24)
map1 <- tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps_path_car24_day13) +
tm_lines(col = "blue") +
tm_shape(gps2_stop_day13) +
tm_dots() +
tm_shape(gps2_stop_car24_day13) +
tm_dots(col = "blue", size = 0.1)
tmap_leaflet(map1)
Figure 5: Stop Locations and Driving Path of Car 24 on Day 13
Hovering over blue dots, we can see the stop locations of car 24. On day 13, the car started running at about 7 o’clock from home (the east area in the map) and stopped at “Katerina’s Café” (the south-east area) for half an hour. Then, the car stopped near “Albert’s Fine Clothing” at around noon (the north-west area).
After that, the car stopped near “U-Pump” (the center area) from 12:35 to 13:22. The purchase time in “U-Pump”, 13:18:00, matches the time period.
From 13:27 to 17:57, the car stopped at the GASTech company (south area), which could be the employee was working.
After the work, the car stopped near the “Brew’ve Been Served” (the south-east area) from 18:00 to 19:29. The high consumption occured in this period. The stop location is also close to the “Frydos Autosupply n’ More”. So the driver might stopped the car and walked to the “Frydos Autosupply n’ More” to make the consumption.
There are strange things.
The consumption at “Daily Dealz” occurred at 06:04:00, while the car left home at 07:32:01. It’s strange that the purchase happedned so early and the location can’t be found in other records
The consumption at “Hippokampos” occurred at 13:28:00, while the car stopped at the company at 13:27:14. The time gap is about 30 seconds
The consumption at “Ouzeri Elian” occurred at 19:30:00, while the car left the “Frydos Autosupply n’ More” at 19:29:01. The time gap is just 30 seconds after the car left
We can check the consumption from the combination of credit and loyalty cards data. We use left join to find the corresponding records in the loyalty data.
knitr::kable(cc %>%
filter(day == 13 & last4ccnum == 9551) %>%
left_join(loyalty, by = c("location", "day", "price")),
caption = "Consumption Records of cc 9551 with Corresponding Loyalty Records on Day 13") %>%
kableExtra::kable_paper("hover", full_width = F)
| timestamp.x | location | price | last4ccnum | date | day | hour | timestamp.y | loyaltynum |
|---|---|---|---|---|---|---|---|---|
| 2014-01-13 06:04:00 | Daily Dealz | 2.01 | 9551 | 2014-01-13 | 13 | 6 | NA | NA |
| 2014-01-13 13:18:00 | U-Pump | 55.25 | 9551 | 2014-01-13 | 13 | 13 | NA | NA |
| 2014-01-13 13:28:00 | Hippokampos | 30.51 | 9551 | 2014-01-13 | 13 | 13 | 2014-01-13 | L5777 |
| 2014-01-13 19:20:00 | Frydos Autosupply n’ More | 10000.00 | 9551 | 2014-01-13 | 13 | 19 | NA | NA |
| 2014-01-13 19:30:00 | Ouzeri Elian | 28.75 | 9551 | 2014-01-13 | 13 | 19 | 2014-01-13 | L5777 |
We can see that the two consumption records, which have little time gap with the car leaving/stopping, exactly have corresponding loyalty card usage. But the other three consumption records should be less rush but didn’t use loyalty card. One possible explaination might be the card stealing. This suspicious activity need to be analyzed further in question 5.
The first step is to find the corresponding records.
knitr::kable(cc %>%
filter(location == "Kronos Mart"),
caption = "Consumption at Kronos Mart") %>%
kableExtra::kable_paper("hover", full_width = F)
| timestamp | location | price | last4ccnum | date | day | hour |
|---|---|---|---|---|---|---|
| 2014-01-10 09:30:00 | Kronos Mart | 203.91 | 7688 | 2014-01-10 | 10 | 9 |
| 2014-01-12 03:39:00 | Kronos Mart | 277.26 | 8156 | 2014-01-12 | 12 | 3 |
| 2014-01-13 03:00:00 | Kronos Mart | 147.30 | 5407 | 2014-01-13 | 13 | 3 |
| 2014-01-13 08:01:00 | Kronos Mart | 159.06 | 6816 | 2014-01-13 | 13 | 8 |
| 2014-01-14 08:20:00 | Kronos Mart | 58.85 | 6899 | 2014-01-14 | 14 | 8 |
| 2014-01-16 07:30:00 | Kronos Mart | 298.83 | 7108 | 2014-01-16 | 16 | 7 |
| 2014-01-17 08:08:00 | Kronos Mart | 286.24 | 1415 | 2014-01-17 | 17 | 8 |
| 2014-01-19 03:13:00 | Kronos Mart | 87.66 | 3484 | 2014-01-19 | 19 | 3 |
| 2014-01-19 03:45:00 | Kronos Mart | 194.51 | 9551 | 2014-01-19 | 19 | 3 |
| 2014-01-19 03:48:00 | Kronos Mart | 150.36 | 8332 | 2014-01-19 | 19 | 3 |
The strange consumption records are the second row, the third row and the last 3 rows, which occurred at 3 o’clock.
The records on day 19 belong to the owners of credit cards 3484, 9551, 8332.
Coincidentally, credit card 9551 also appeared in the Anomaly 1.
Day 19 is one day before the employee missing incident. We can check the car stop points in the recent one week to find the reason or any anomalies.
gps2_stop_days <- gps2_stop_sf %>%
filter(between(day,13,18))
gps2_stop_day19 <- gps2_stop_sf %>%
filter(day == 19)
map2 <- tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
alpha = NA,
saturation = 1,
interpolate = TRUE,
max.value = 255) +
tm_shape(gps2_stop_days) +
tm_dots(size = 0.1, alpha = 0.5) +
tm_shape(gps2_stop_day19) +
tm_dots(col = "red", size = 0.1, alpha = 0.5)
tmap_leaflet(map2)
Figure 6: Stop Locations from Day 13 to Day 19
The “Kronos Mart” is located at the west direction with a red symbol. After zooming the map, we can see there were no car stop location near the mart On day 19 (red dot). And several closer red dot, which located at “Roberts and Sons”, were in the afternoon (stop period within 13 to 14 o’clock).
But there are three black dot which are very close to the “Kronos Mart”. Furthermore, The three car stop all started at about 13:30 and ends at about 16:00 on day 18.
The owners of the three cars are listed.
gps2_stop_days %>%
filter((id == 1 | id == 10 |id == 23)
& day == 18
& start > "2014-01-18 13:00:00"
& end < "2014-01-18 16:00:00") %>%
knitr::kable(caption = "The Three Car Stop Near Kronos Mart") %>%
kableExtra::kable_paper("hover", full_width=T) %>%
kableExtra::scroll_box(width = "100%")
| start | end | diff_mins | id | day | idx | geometry | idx1 | LastName | FirstName | CurrentEmploymentType | CurrentEmploymentTitle |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2014-01-18 13:48:01 | 2014-01-18 15:14:01 | 86.0 mins | 23 | 18 | 661157 | POINT (24.8498 36.06586) | 2665 | Lagos | Varja | Security | Badging Office |
| 2014-01-18 13:29:31 | 2014-01-18 15:52:01 | 142.5 mins | 10 | 18 | 663247 | POINT (24.84983 36.06588) | 2672 | Campo-Corrente | Ada | Executive | SVP/CIO |
| 2014-01-18 13:36:43 | 2014-01-18 15:58:01 | 141.3 mins | 1 | 18 | 663582 | POINT (24.84982 36.06582) | 2674 | Calixto | Nils | Information Technology | IT Helpdesk |
We can’t get insights from the car owner information since they belong to different employment type. But they stayed at the same location for similar time period. They are very likely to meet each other and do the same thing. Besides, the consumption at 3 o’clock came from 3 credit cards and this meetup also involved in 3 persons.
Thus, one possible explanation of the consumption at mid-night could be that the three car owners came to discuss some plans on day 18 and met again at 3 o’clock on day 19.
Another possible explanation direction could be persons just stayed near the mart, so they don’t need to drive and walked there to make consumption. Or the three person use other vehicles, not from the company, to reach the mart and make consumption.
This suspicious activities will be analyzed further in question 5. It might need to check the behaviors of the three car owners in the 14 days.
cc_num <- length(unique(cc$last4ccnum))
loyalty_num <- length(unique(loyalty$loyaltynum))
ppl_num <- length(car_assignments$LastName)
c(cc_num, loyalty_num, ppl_num)
[1] 55 54 44
We can find that there are 44 employees, but 55 credit cards and 54 loyalty cards. If we suppose no errors in the card id, one employee has one or more credit cards and one or more loyalty cards.
Interactive graphs consume a lot of time to load in the page, so this study is divided into two blogs.
To be continued with Part 2.